Benchmarking Commercial OCR Engines for Technical Drawings Indexing

نویسندگان

  • J. C. Lecoq
  • Laurent Najman
  • Olivier Gibot
  • Éric Trupin
چکیده

The choice of a commercial Optical Character Recognition (OCR) engine is important for the process of automatically indexing technical drawings from their title blocks. We would like to benchmark commercial OCR engines with respect to their inclusion in the global digitalisation chain from scanning to understanding the text information contained in a technical drawing document. The crucial (costly) point is the manual correction of OCR recognition errors. By benchmarking, we intend to identify, for our application domain, the causes for OCR errors which are the most costly to

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Retrieval System for Graphical Documents

We present a method for indexing line drawings automatically. The indexing scheme is used for the retrieval of line-drawings in a weighted information retrieval (IR) system. Being content-based, the indexing method depends not only on the graphical structures in the drawings, but on the textual entries as well. No a priori knowledge is used in the indexing scheme, since application-speciic assu...

متن کامل

Automatic Indexing for Storage and Retrieval of Line Drawings

The usefulness of a collection of scanned graphical documents can be measured by the facilities available for their retrieval. We present an approach for indexing a collection of line drawings automatically. The indexing is based on the textual and graphical content of the drawings. This approach has been developed to facilitatèretrieval by example' in heterogeneous collections of graphical doc...

متن کامل

Towards content-based retrieval of technical drawings through high-dimensional indexing

This paper presents a new approach to classify, index and retrieve technical drawings by content. Our work uses spatial relationships, visual elements and high-dimensional indexing mechanisms to retrieve complex drawings from CAD databases. This contrasts with conventional approaches which use mostly textual metadata for the same purpose. Creative designers and draftspeople often re–use data fr...

متن کامل

Content-Based Image Retrieval Systems: A Survey

In many areas of commerce, government, academia, and hospitals, large collections of digital images are being created. Many of these collections are the product of digitizing existing collections of analogue photographs, diagrams, drawings, paintings, and prints. Usually, the only way of searching these collections was by keyword indexing, or simply by browsing. Digital images databases however...

متن کامل

Reliable OCR solution for digital content re-mastering

This paper addresses the system’s aspects of OCR solutions in the context of digital content re-mastering. It analyzes the unique requirements and challenges to implement a reliable OCR system in a high-volume and unattended environment. A new reliability metric is proposed and a practical solution based on the combination of multiple commercial OCR engines is introduced. Experimental results s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001